HAQWA: a Hash-based and Query Workload Aware Distributed RDF Store
نویسندگان
چکیده
Like most data models encountered in the Big Data ecosystem, RDF stores are managing large data sets by partitioning triples across a cluster of machines. Nevertheless, the graphical nature of RDF data as well as its associated SPARQL query execution model makes the efficient data distribution more involved than in other data models, e.g., relational. In this paper, we propose a novel system that is characterized by a trade-off between complexity of data partitioning and efficiency of query answering in cases where a query workload is known. The prototype is implemented over the Apache Spark framework, ensuring high availability, fault tolerance and scalability. This short paper presents the main features of the system and highlights the omnipresence of parallel computation across data fragmentation and allocation, encoding and query processing tasks.
منابع مشابه
Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning
Massive volumes of big RDF data are growing beyond the performance capacity of conventional RDF data management systems operating on a single node. Applications using large RDF data demand efficient data partitioning solutions for supporting RDF data access on a cluster of compute nodes. In this paper we present a novel semantic hash partitioning approach and implement a Semantic HAsh Partition...
متن کاملStorage Balancing in P2P Based Distributed RDF Data Stores
Centralized RDF repositories have been designed to support RDF data storage and retrieval. However, they suffer from the traditional limitations of centralized approaches which are scalability and fault tolerance. Peer to Peer (P2P) networks can provide the scalability, fault-tolerance and robustness, features that the current solutions to local RDF storage do not provide which are needed by th...
متن کاملEvaluating SPARQL Queries on Massive RDF Datasets
Distributed RDF systems partition data across multiple computer nodes. Partitioning is typically based on heuristics that minimize inter-node communication and it is performed in an initial, data pre-processing phase. Therefore, the resulting partitions are static and do not adapt to changes in the query workload; as a result, existing systems are unable to consistently avoid communication for ...
متن کاملAdaptive Partitioning for Very Large RDF Data
State-of-the-art distributed RDF systems partition data across multiple computer nodes (workers). Some systems perform cheap hash partitioning, which may result in expensive query evaluation, while others apply heuristics aiming at minimizing inter-node communication during query evaluation. This requires an expensive data pre-processing phase, leading to high startup costs for very large RDF k...
متن کاملAccessing XML Documents Using Semantic Meta Data in a P2P Environment
XGR (XML Data Grid) and BabelPeers are both data management systems based on distributed hash tables (DHT) that use the Pastry DHT to store data and meta data. XGR is based on the XML data model; BabelPeers uses the Resource Description Framework (RDF) for its data. XGR and BabelPeers have different but complementary functionality. On the one hand, XGR focuses on document-based storage of XML d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015